{
 "metadata": {
  "name": "",
  "signature": "sha256:24236d7272c0fcb1cc62f24b7658659553615b8265e30f8921a0e1caca281472"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Logging data"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from planout.ops.random import *\n",
      "from planout.experiment import SimpleExperiment\n",
      "import pandas as pd\n",
      "import json"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Log data"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here we explain what all the fields are in the log data. Run this:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "class LoggedExperiment(SimpleExperiment):\n",
      "    def assign(self, params, userid):\n",
      "        params.x = UniformChoice(choices=[\"What's on your mind?\", \"Say something.\"], unit=userid)\n",
      "        params.y = BernoulliTrial(p=0.5, unit=userid)\n",
      "\n",
      "print LoggedExperiment(userid=5).get('x')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Then open your terminal, navigate to the directory this notebook is in, and type:\n",
      "\n",
      "```\n",
      "> tail -f LoggedExperiment.log\n",
      "```\n",
      "\n",
      "You can now see how data is logged to your experiment as its run."
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Exposure logs"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Whenever you request a parameter, an exposure is automatically logged. In a production environment, one would use caching (e.g., memcache) so that we only exposure log once per unit. `SimpleExperiment` exposure logs once per instance."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "e = LoggedExperiment(userid=4)\n",
      "print e.get('x')\n",
      "print e.get('y')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Manual exposure logging"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Calling `log_exposure()` will force PlanOut to log an exposure event. You can optionally pass in additional data."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "e.log_exposure()\n",
      "e.log_exposure({'endpoint': 'home.py'})"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Event logging"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You can also log arbitrary events. The first argument to `log_event()` is a required parameter that specifies the event type."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "e.log_event('post_status_update')\n",
      "e.log_event('post_status_update', {'type': 'photo'})"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Putting it all together"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We simulate the components of a PlanOut-driven website and show how data analysis would work in conjunction with the data generated from the simulation.\n",
      "\n",
      "This hypothetical experiment looks at the effect of sorting a music album's songs by popularity (instead of say track number) on a Web-based music store.\n",
      "\n",
      "Our website simulation consists of four main parts:\n",
      " * Code to render the web page (which uses PlanOut to decide how to display items)\n",
      " * Code to handle item purchases (this logs the \"conversion\" event)\n",
      " * Code to simulate the process of users' purchase decision-making\n",
      " * A loop that simulates many users viewing many albums"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "class MusicExperiment(SimpleExperiment):\n",
      "    def assign(self, params, userid, albumid):\n",
      "        params.sort_by_rating = BernoulliTrial(p=0.2, unit=[userid, albumid])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import random\n",
      "\n",
      "def get_price(albumid):\n",
      "    \"look up the price of an album\"\n",
      "    # this would realistically hook into a database\n",
      "    return 11.99"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Rendering the web page"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def render_webpage(userid, albumid):\n",
      "    'simulated web page rendering function'\n",
      "    \n",
      "    # get experiment for the given user / album pair.\n",
      "    e = MusicExperiment(userid=userid, albumid=albumid)\n",
      "    \n",
      "    # use log_exposure() so that we can also record the price\n",
      "    e.log_exposure({'price': get_price(albumid)})\n",
      "    \n",
      "    # use a default value with get() in production settings, in case\n",
      "    # your experimentation system goes down\n",
      "    if e.get('sort_by_rating', False):\n",
      "        songs = \"some sorted songs\" # this would sort the songs by rating\n",
      "    else:\n",
      "        songs = \"some non-sorted songs\"\n",
      "    \n",
      "    html = \"some HTML code involving %s\" % songs  # most valid html ever.\n",
      "    # render html"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Logging outcomes"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def handle_purchase(userid, albumid):\n",
      "    'handles purchase of an album'\n",
      "    e = MusicExperiment(userid=userid, albumid=albumid)\n",
      "    e.log_event('purchase', {'price': get_price(albumid)})\n",
      "    # start album download"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Generative model of user decision making"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def simulate_user_decision(userid, albumid):\n",
      "    'simulate user experience'\n",
      "    # This function should be thought of as simulating a users' decision-making\n",
      "    # process for the given stimulus - and so we don't actually want to do any\n",
      "    # logging here.\n",
      "    e = MusicExperiment(userid=userid, albumid=albumid)\n",
      "    e.set_auto_exposure_logging(False)  # turn off auto-logging\n",
      "    \n",
      "    # users with sorted songs have a higher purchase rate\n",
      "    if e.get('sort_by_rating'):\n",
      "        prob_purchase = 0.15\n",
      "    else:\n",
      "        prob_purchase = 0.10\n",
      "    \n",
      "    # make purchase with probability prob_purchase\n",
      "    return random.random() < prob_purchase"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Running the simulation"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "# We then simulate 500 users' visitation to 20 albums, and their decision to purchase\n",
      "random.seed(0)\n",
      "for u in xrange(500):\n",
      "    for a in xrange(20):\n",
      "        render_webpage(u, a)\n",
      "        if simulate_user_decision(u, a):\n",
      "            handle_purchase(u, a)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Loading data into Python for analysis"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Data is logged to `MusicExperiment.log`. Each line is JSON-encoded dictionary that contains information about the event types, inputs, and parameter assignments."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "raw_log_data = [json.loads(i) for i in open('MusicExperiment.log')]\n",
      "raw_log_data[:2]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "It's preferable to deal with the data as a flat set of columns. We use this handy-dandy function Eytan found on stackoverflow to flatten dictionaries."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# stolen from http://stackoverflow.com/questions/23019119/converting-multilevel-nested-dictionaries-to-pandas-dataframe\n",
      "from collections import OrderedDict\n",
      "def flatten(d):\n",
      "    \"Flatten an OrderedDict object\"\n",
      "    result = OrderedDict()\n",
      "    for k, v in d.items():\n",
      "        if isinstance(v, dict):\n",
      "            result.update(flatten(v))\n",
      "        else:\n",
      "            result[k] = v\n",
      "    return result"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here is what the flattened dataframe looks like:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "log_data = pd.DataFrame.from_dict([flatten(i) for i in raw_log_data])\n",
      "log_data[:5]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Joining exposure data with event data"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We first extract all user-album pairs that were exposed to an experiemntal treatment, and their parameter assignments."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "all_exposures = log_data[log_data.event=='exposure']\n",
      "unique_exposures = all_exposures[['userid','albumid','sort_by_rating']].drop_duplicates()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Tabulating the users' assignments, we find that the assignment probabilities correspond to the design at the beginning of this notebook."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "unique_exposures[['userid','sort_by_rating']].groupby('sort_by_rating').agg(len)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we can merge with the conversion data."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "conversions = log_data[log_data.event=='purchase'][['userid', 'albumid','price']]\n",
      "df = pd.merge(unique_exposures, conversions, on=['userid', 'albumid'], how='left')\n",
      "df['purchased'] = df.price.notnull()\n",
      "df['revenue'] = df.purchased * df.price.fillna(0)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here is a sample of the merged rows. Most rows contain missing values for price, because the user didn't purchase the item."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df[:5]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Restricted to those who bought something..."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df[df.price > 0][:5]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Analyzing the experimental results"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df.groupby('sort_by_rating')[['purchased', 'price', 'revenue']].agg(mean)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If you were actually analyzing the experiment you would want to compute confidence intervals."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}